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Abstract: The analysis of computer and communication networks gives rise to 
some interesting inverse problems. This paper is concerned with active network 
tomography where the goal is to recover information about quality-of-service 
(QoS) parameters at the link level from aggregate data measured on end-to- 
end network paths. The estimation and monitoring of QoS parameters, such 
as loss rates and delays, are of considerable interest to network engineers and 
Internet service providers. The paper provides a review of the inverse problems 
and recent research on inference for loss rates and delay distributions. Some 
new results on parametric inference for delay distributions are also developed. 
In addition, a real application on Internet telephony is discussed. 



1. The inverse problems 

Consider a topology with a tree structure denned as follows: T = {V, £} has a 
set of nodes V and a set of links or edges £ . Figure Q] shows two examples, a 
simple two-layer symmetric binary tree on the left and a more general four-layer 
tree on the right. Each member of £ is a directed link numbered after the node 
at its terminus. V includes a (single) root node 0, a set of receiver or destination 
nodes 1Z, and a set of internal nodes T. The internal nodes have a single incoming 
link and at least two outgoing links (children). The receiver nodes have a single 
incoming link but no children. For the tree on the right panel of Figure [l] 72. — 
{2, 3, 6, 8, 9, 10, 11, 12, 13, 14, 15} and T = {1, 4, 5, 7}. 

All transmissions are sent from the root (or source) node to one or more of the 
receiver nodes. This generates independent observations Xk at all links along the 
paths to those receiver nodes. Let X denote this set of measurements. These data are 
not directly observable; rather we can collect only end-to-end data at the receiver 
nodes: Y r — /(X) for r £ 1Z. The statistical inverse problem is to reconstruct the 
distributions of the link-level X^s from these path-level measurements. 

Examples of /(•) are: /(X) = £fceP(o,r) X k , /(X) = ILeP(o,r) x k, and /(X) = 
minfc e -p(o !r ) X}., and /(X) = maxj. e p( .r) Xk, where V(0, r) is the path between the 
root node and the receiver node r. In this paper, we will be concerned only with 
the first two cases of /(•) above. 

To understand the statistical issues and challenges involved, let us examine some 
simple examples. 
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Fig 1. Examples of tree network topologies. A binary two-layer tree is shown on the left panel and 
a general four-layer tree on the right panel. The path lengths from the root to nodes belonging to 
the same layer are the same. 



Example 1. Consider the two-layer binary tree on the left panel of Figure HJ and 
suppose the Xk are binary with P(Xk = 1) = ctk, k = 1,2,3 for the three links. 
Further, the root node sends transmissions to the receiver nodes one at a time. 
Take /(a, b) = ab. Then, the observed data are Y 2 j = X\jX 2 j for transmission j 
and Ys m = Xi m X 3m for transmission m. They are independent Bernoulli with prob- 
abilities a\a 2 and a\a 3 , respectively. Suppose we send M transmissions to receiver 
node 2 and N transmissions to receiver node 3. Let Mi and N\ be the respective 
number of "ones" . Then, Mi and N\ are independent binomial random variables 
with success probabilities a\a 2 and a\a 3 . From these data, we can estimate only 
a\a 2 and otia.3. The individual link-level parameters a\, a 2 and a 3 cannot be fully 
recovered. 

Example 2. Take the same two-layer binary tree with binary outcomes with 
f(a, b) = ab as above. But now the root node sends transmissions to receiver nodes 

2 and 3 simultaneously. In other words, the m-th transmission generates random 
variables X\ m , X 2m and X 3m on all of the links. We observe Y 2m = X lm X2 m 
and Yzm = X\ m X 3m . The distinction from Example 1 is that the X\ m is com- 
mon to both Yim and Y$ m . Now, each transmission has 4 possible outcomes: 
(1,1), (1,0), (0,1), (0,0) depending on whether the transmission reaches none, 
one, or both of the receiver nodes. If we send N such transmissions to nodes 2 and 

3 simultaneously, the result is a multinomial experiment with probabilities ai«2, 
«i(l — a 2 ), (1 — ai)a 2 , and (1 — ai)(l — a 2 ) corresponding to the four outcomes. 
Let N(i,j) denote the number of events with outcome Then, E[N(1, 1)] = 
aia 2 a 3 , E[N(1, 1) + N(l, 0)] = a x a 2 , and E[N(1, 1) + N(0, 1)] = a x a^. It is easy 
to see that we can estimate all the three link-level parameters from these measure- 
ments. Thus, the data transmission scheme plays an important role in this type of 
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inverse problems. 

Example 3. Again we have a two-layer binary tree but now /(a, b) = a + b. Then 
Y2 — Xi + X2 and Y3 = X\ + A3. Let Fk be the distribution of the link-level 
random variables Xk G 1Z, for k = 1,2,3. Assume, as in Example 2, that the 
root node sends transmissions simultaneously to both receivers. In this case, even 
with simultaneous transmission to both receivers, the link-level parameters are not 
always identifiable. Just take Xk to be independent Normal(/^, 1), k = 1, 2, 3. Then 
I2 and Y3 are bivariate normal with mean \i\ + /X2 and \i\ + /i3, variance 2 and 
correlation 1. One can see that the individual fik cannot be recovered from the joint 
distribution of Y2 and Y3. Additional assumptions on the distribution are needed 
in order to solve the inverse problem. We will revisit this issue. 

Example 4. Consider now the more general tree on the right panel of Figure [TJ 
Again, we send transmissions to all of the receiver nodes simultaneously. If the ran- 
dom variables are binary and f(x\, . . . , x p ) — Ilj=i x h a ^ the link- level parameters 
are identifiable. The same is true for a general Xk with /(xi, . . . , x p ) — x j 
under suitable conditions on the distribution of the Xk (as discussed later in the 
paper). However, it may be "expensive" to send transmissions to all receiver nodes 
simultaneously. Instead, can we schedule transmissions to some judicious subsets of 
the receiver nodes at a time and combine the information appropriately to estimate 
all the link-level parameters? It is clear from Example 1 that it is not enough to 
send transmissions to one receiver node at a time. How should the transmission 
scheme be designed in order to estimate all the parameters? Are there some "good" 
schemes (according to some appropriate criteria)? 

These examples are simple instances of issues that arise in the context of ana- 
lyzing computer and communications networks and are collectively referred to as 
active network tomography. In the next section, we will describe the network ap- 
plication and the need for estimating quality-of-service (QoS) parameters such as 
loss rates and delays. Section 3 provides an overview of recent results in the lit- 
erature on the design of transmission experiments and inference for loss rates and 
discrete delay distributions. A real application on data collected from the campus 
network at the University of North Carolina, Chapel Hill is used to illustrate some 
of the results. Section 4 develops some new results on parametric inference for delay 
distributions. 



2. Active network tomography 

The area of network tomography originated with the pioneering work of Vardi 
[3] where the term was first introduced. His work dealt with another type of in- 
verse problem relating to origin-destination (OD) traffic matrix estimation. The 
OD information is important in network management, capacity planning, and pro- 
visioning. In this problem, one is interested in estimating the intensities of traffic 
flowing between the origin-destination pairs in the network. However, we cannot 
collect these data directly; rather, one places equipment at the individual nodes 
(routers/switches) and collects aggregate data on all traffic flowing through the 
nodes i € V. The goal is to recover distributions of origin-destination traffic be- 
tween all pairs of nodes in the network. There has been considerable work in this 
area, and a summary of the developments can be found in Q . 

Active network tomography, on the other hand, is concerned with the "opposite" 
problem of estimating link-level information from end-to-end data. One sends test 
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probes (packets) (active probing) from a source to one or more receiver nodes on the 
periphery of the network and gets end-to-end path-level data on losses and delays. 
One then has to solve the inverse problem of reconstructing link-level loss and 
delay information from the end-to-end data. The specific goal is to estimate QoS 
parameters such as loss rates and delays at the link level. The reason for probing 
the network from the outside is that Internet service providers or other interested 
parties often do not have access to the internal nodes of the network (which may 
be owned by a third party) . Nevertheless, they have to assess QoS of the links over 
which they are providing service. Active tomography offers a convenient approach 
by probing the network from nodes located on the periphery. 

The probing and data collection are done with dedicated instruments at the root 
node and receiver nodes. These packets can be sent to one receiver at a time (unicast 
transmission scheme) or to a specified subset of receivers (multicast scheme). Some 
networks have turned off the multicast scheme for security reasons. In this case, 
one sends unicast packets to several receivers spaced closely in time with the goal 
of trying to mimic the multicast scheme. 

What causes losses and delays of packets over the network? When a packet 
arrives at a node, it joins a queue of incoming packets. If the buffer is full, the 
packet is dropped, i.e., lost. Depending on the protocol, the packet may or may 
not be resent. Packets also encounter delays along the path, primarily due to the 
queueing process above. 

In the case of losses, the binary outcome = or 1 indicates whether the packet 
is lost (dropped) or not. In terms of the examples in Section 1, f(xi, . . . ,Xk) — 
Y\k x k, an d the end-to-end loss Y — YlkeV(o r) -^k = 1 if the packet transmitted 
along the path V(0,r) reached the receiver node r and zero otherwise. For delays, 
f(xi, . . . , xk) = J2k x k> an d the end-to-end observation is Y = J2keV(o r) ^k, tnc 
path-level delay. 

The physical topology of a network is usually complicated. But the logical topol- 
ogy with a single source node can often be represented as a tree. For example, the 
left panel of Figure 2 shows the physical topology of a subnetwork at the campus 
of the University of North Carolina at Chapel Hill. The right panel shows the cor- 
responding logical topology, which is a tree with a directed flow. We will revisit 
this network later in the paper. It is possible to deal with topologies with multiple 
sources, other kinds of transmission schemes (two-way flows), and so on. But for 
simplicity, we will restrict attention to the tree structures in this paper. 
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Fig 2. Left panel: Schematic of the UNC network; Right panel: Logical topology of the UNC 
network. 
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3. Literature review of loss and discrete delay inference 

Most of the results in the literature on active tomography have been developed 
under the assumption that the loss rates and delay distributions are temporally 
homogeneous and are independent across links. We will also use this framework. 
The assumption of temporal homogeneity is reasonable as the probing experiments 
are done within the order of minutes. The assumption of independence across links 
is less likely to hold. However, the nature of the dependence will vary from network 
to network, and it is difficult to obtain general results. 

3.1. Design of probing experiments 

We noted in Example 1 that the link-level parameters are not identifiable under 
the unicast transmission scheme (sending probes to one receiver at a time). The 
multicast scheme, which sends packers to all the receivers in the network simultane- 
ously, addresses this problem for loss rates and, under some additional conditions, 
for delay distributions as well. 

However, this scheme has a number of drawbacks. It creates more traffic than 
necessary for estimating the link-level parameters. Also, the data generated are very 
high-dimensional. For example, in a binary symmetric tree with L layers, there are 
R = 2 L — I receiver nodes. A multicast scheme for measuring loss rates results in 
a multinomial experiment with 2 R possible outcomes. This is a large number even 
for moderately sized trees. The most important drawback, however, is that it is 
inflexible and does not allow investigation of subnetworks using different intensities 
and at different times. In practice, one may want to probe sensitive parts of the 
network as lightly as necessary to avoid disturbance. So there is a need for more 
flexible probing experiments. As pointed out in Example 4, this raises interesting 
issues on how to design the probing experiments. 

A class of flexible probing experiments, called flexicast experiments, were in- 
troduced and studied in Xi et al. [13] and Lawrence et al. This consists of a 
combination of schemes for different values of k with each scheme aimed at study- 
ing a subnetwork. However, each of the scheme by itself will not necessarily allow 
us to estimate the link-level parameters of that subnetwork. The data have to be 
combined across the various k-cast schemes to estimate the link-level parameters. 

To illustrate the ideas, consider the network on the right panel in Figure [TJ 
The multicast scheme sends probes simultaneously to {(2, 3, 6, 8, 9, 10, 11, 12, 13, 14, 
15)}. Two possible flexicast experiments are: 

(1) {(2, 3), (6, 12), (13, 14), (8, 15), (9, 10), (11)} 
and 

(2) {(2, 3), (6), (12, 13, 14, 15), (8, 9, 10, 11)}. 

The former consists of only bicast (two receiver nodes at a time) and unicast 
schemes. Intuitively, the latter scheme appears to more "efficient" but we will see 
shortly that it does not allow one to estimate all the link-level parameters. 

A full multicast scheme for this tree will result in 11-tuples or 11-dimensional 
data. The first flexicast experiment using pairs and singletons can cover the whole 
tree with five pairs and one singleton. The resulting data are considerably less 
complex in terms of processing and computations for inference. This advantage is 
particularly important for trees with many layers. 
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Of course, not all flexicast experiments will permit estimation of the link-level 
parameters. To discuss the technical issues associated with the identifiability prob- 
lem, consider first the notion of a splitting node. For a fc-cast scheme, an internal 
node is a splitting node if the scheme splits at that node. For example, for the tree 
on the right panel of Figure Q] the bicast scheme {(6, 12)} splits at node 4. Xi et 



al. [17( showed that the following conditions are necessary and sufficient for iden- 
tifiability of link-level loss rates: (a) all receiver nodes are covered; and (b) every 
internal node in the tree is a splitting node for some k-cast scheme in the flexicast 
experiment. Lawrence et al. [8| studied the delay problem and showed that the 
same conditions are also necessary and sufficient for estimating delay distributions 
provided the distributions are discrete. The case where the delay distributions are 
not discrete is discussed in the next section. 

Consider again the flexicast schemes in equations (1) and (2) for the tree on the 
right panel in Figure [T] The first one based on a collection of bicast and unicast 
schemes satisfies the conditions. For the second one, none of the k-cast schemes 
split at node 4. 

There are many flexicast experiments that satisfy the identifiability requirements, 
and the choice among these has to be based on other criteria. Experiments based on 
just bicast and unicasts have minimal data complexity - just 1- and 2-dimensional 
outcomes. However, these provide information on just first and second-order de- 
pendencies and will be less efficient (in a statistical sense) to k-cast schemes with 
higher values of k. In particular, the full mulitcast scheme will be most efficient in 
this sense. So the overall choice of the flexicast experiment has to be a compromise 
between statistical efficiency and flexibility including the ability to adapt over time 
to accommodate changes in network conditions. 

3.2. Inference for loss rates 

Inference for loss rates was first studied in Caceres et al. 0] for the multicast scheme. 
A recent, up-to-date list of references can be found in Xi et al. [13] who developed 
MLEs based on the EM algorithm for flexicast experiments. We provide next a brief 
review of these results. 

Each fc-cast scheme in a flexicast experiment is a fc-dimensional multinomial ex- 
periment. Specifically, each outcome is of the form {Z ri , . . . , Z rk } where Z T] = 1 or 
depending on whether the probe reached receiver node rj or not. Let N( ri rfc ) 
denote the number of outcomes corresponding to this event, and let 7( ri ....,r fc ) be the 
probability of this event. Then the log- likelihood for the fc-cast scheme is propor- 
tional to 7(n ...,rfc) l°g(-^(ri,...,r fc ))- The overall log-likelihood is just the sum of the 
log-likelihoods for these individual experiments. However, the "/( ri ,...,r k ) are compli- 
cated functions of the link- level loss rates, so one has to use numerical methods 
to obtain the MLEs. 

The EM algorithm is a natural approach for computing the MLEs and has been 
used extensively in network tomography applications (see [!, H, [Hj] ) . The structure 
of the EM-algorithm for general flexicast experiments was developed in Xi et al. 



171 ]. While the E-step can be complex for arbitrary collections of fc-cast schemes, it 
simplifies for flexicast experiments comprised of bicast and unicast schemes as seen 
below. 

Let Sb be the splitting node for bicast pair b = (ib,jb)- Then, 7r(0,Sf,), ir(sb,ib) 
and ir(sb,jb), the three path probabilities for this bicast pair are products of the 
ctfc. Starting with an initial value a' ' let be the value after the fc-th iteration. 
Then, we can write the (k + l)-th iteration of the E-step as follows: 
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E-step: 

1. For each bicast pair: 

(a) Use to obtain the updated path probabilities (0, Sb), Tr( k \s b ,i b ), 
^(sbjb) and 7o (fe) . 

(b) For each node £ G V(0, s b ) U V(s b , i b ) U V(s b ,j b ), compute V$j +1) = E sW [V e \ 
Mb], where Af b = {N$ , N^, N% , iV-fJ are the collected counts of the four 
possible outcomes, as follows. 

For node £ G V(0,s b ), 

(k) 

y(k+l) _ N b_ ^ 1 ~ <H 

V £,b - ly ly 00 b (fc) ' 

7oo 

For link £ G V(s b ,i b ), 

V '' b _JV_JVoiX l-*W{s b ,i b ) N °° • 
For link £eP(s b ,jb), 

2. Unicast schemes: Let node £ G V(0,u) for a unicast transmission to receiver 
node u, and compute 

i _ (fe) 

y(fc+l) — M u — M u V ^ 

^ " iV iV ° X 1-7T(*)(0,«) 



M-step: The (fc + l)-th update for the M-step is simply 

(fc+l) _ l^beBe V £,b + l^ueUe v i,u 



^ EbeB e N b + Eueu e N- 

where Be is the set of bicast pairs that includes the node £ in its path and U i is 
the set of all unicast schemes that includes node £ in its path. 

In our experience, the EM algorithm works reasonably well for small to moderate 
networks when used with a flexicast experiment that consists of a collection of bi- 
cast and unicast schemes. For large networks, however, it becomes computationally 
intractable. In on-going work, we are developing a class of fast estimation meth- 
ods based on least-squares methods and are studying their application to on-line 
monitoring of network performance. 



3.3. Inference for discrete delay distributions 

For the delay problem, let Xk denote the (unobservable) delay on link fc, and let 
the cumulative delay accumulated from the root node to the receiver node r be 
Y r = X)fceP(o r) Xk- Here V(0,r) denotes the path from node to node r. The 
observed data are end-to-end delays consisting of Y r for all the receiver nodes. 

Most of the papers on delay inference assume a discrete delay distribution. Specif- 
ically, if q denotes the universal bin size, Xk G {0, q,2q,..., bq} is the discretized 
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delay on link k and bq is the maximum delay. Let ak(i) = P{Xf. = iq}. The in- 
ference problem then reduces to estimating the parameters ajM) for k £ £ and i 
in {0, 1, . . . , b} using the end-to-end data Y r , Lo Presti et al. [10( developed a fast, 
heuristic algorithm for estimating the link delays. Liang and Yu 0] developed a 
pseudo-likelihood estimation method. Nonparametric maximum likelihood estima- 
tion under the above setting was investigated in Tsang et al. [13J and Lawrence 
et al. 8|. Shih and Hero [12j | examined inference under mixture models. See also 
Zhang [1 81 ] for a more general discussion of the deconvolution problem. 

We discuss nonparametric MLE with discrete delays in more detail. Let dk = 
[oife(O), afc(l), . . . ak(b)}' and let a = [a' , d' 1; . . . , ct', g ,]' . The observed end-to-end 
measurements consist of the number of times each possible outcome y was observed 
from the set of outcomes y c for a given scheme c. Let ATS denote these counts. 
These are distributed as multinomial random variables with corresponding path- 
level probabilities 7 c (y; oT). So the log- likelihood is given by 

cec yey 

This cannot be maximized easily, and one has to resort to numerical methods. 

Again, the EM algorithm is a reasonable technique for computing the MLEs. 
See Q for multicast schemes and [|| for inference with flexicast experiments. How- 
ever, the complexity of the EM algorithm, in particular computing conditional 
expectations of the internal link delays for each bin,_is prohibitive for all but fairly 
small-sized networks. To deal with larger networks, [8( developed a grafting method 
which fits "local" EMs to the subtrees defined by the fc-cast schemes and then com- 
bines the estimates through a fixed point algorithm. This hybrid algorithm is fast 
and has reasonable statistical efficiency compared to the full MLE. 

For bicast schemes, the resulting algorithm has third-order polynomial complex- 
ity, a substantial improvement over the full bicast MLE. The heuristic algorithm in 
[lCj is based on solving higher order polynomials and is much faster. However, it 
uses only part of the data and is quite inefficient. The pseudo-likelihood method of 
Q uses only data from all pairs of probes in the multicast experiment. This is simi- 
lar in spirit to a flexicast experiment comprised of only bicast schemes, although in 
this setting the schemes would be independent. The computational performance of 
the pseudo-likelihood method is faster than the MLE based on the full multicast. 
It is comparable to doing a full EM based on data from all possible bicast schemes. 
This will still not scale up well to very large trees as it includes all possible bicasts 
which can involve a large number of schemes. Furthermore, using the full MLE 
combining the results across all schemes is computationally intensive. The flexicast 
experiments, on the other hand, are typically based on a much smaller number of 
schemes (eve if one restricts attention to bicasts). Further, the grafting algorithm 
is much faster for combining the results across the schemes. 

3-4- Application to the UNC network 

We use a real example to demonstrate how the results from active tomography 
are used. The example deals with estimating the QoS of the campus network at 
the University of North Carolina at Chapel Hill and assessing its capabilities for 
Voice- Over-IP readiness. 

This network has 15 endpoints which were organized into the tree shown in 
Figure [2l Node 1 is the main campus router and it connects to the university gate- 
way. Nodes 2, 3, and 9 are also large routers responsible for different portions of 
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Fig 3. Probability of large delay on 3/7/2005. 



the campus. The accessible nodes are all located in dorms and other university 
buildings. The root node of the tree was Sitterson Hall which houses the computer 
science department. The network was probed in pairs using the following flexicast 
experiment: {(4, 5), (6, 7), (8, 10), (11, 12), (13, 14), (15, 16), (17, 18)}. A single prob- 
ing session consisted of two passes through the collection of experiments sending 
about 500 probes to each pair in a single pass. The experiment was conducted over 
the course of several days in order to evaluate both the network and the methodol- 
ogy. We have collected extensive data but show only selected results for illustrative 
purposes. 

The data presented here were collected at 9:00 a.m., 12:00 p.m., 3:00 p.m., 6:00 
p.m., and 9:00 p.m. on March 1 and 17 of 2005. March 17 was during spring break. 
For both days, we chose a bin size of q — .0001s to assess occurrences of large delays 
on the network. The large bin size also allowed us to use the full MLE to estimate 
the delay distributions. Figures [3] and [4] provide a picture of the probability of large 
delay (larger than a specified threshold) throughout the course of the day. 

From Figure [31 we see that many buildings (Venable, Davis, Rosenau, Smith, 
Greenlaw, and South) show a typical diurnal pattern. These buildings are either 
administrative or departmental building; so the majority of users follow a regular 9 
to 5 schedule. Other buildings are either more uniform throughout the day or even 
more activity at night. Hinton, for example, is a large freshman dorm and thus the 
drop during the day and increase at night are expected as the residents return from 
classes and other activities in the evening. 

A comparison of Figures [3] and [4] shows the difference in dorm activity before 
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Fig 4. Probability of large delay on 3/17/2005. 



and during spring break. Everett, Old East, Hinton, and Craige are dorms. The 
data collected during spring break reveals almost no large delays in three out of 
four of these buildings. This is of course to be expected. The Hinton dorm is espe- 
cially interesting, since it experienced very little congestion over the break, but a 
significant increase to pre-break levels on the first day after the break (post-break 
results are not shown here). 

As a consequence of this study, it became clear that many of the building links 
require upgrades in order to support delay-sensitive applications such as VoIP. Some 
of the departmental and administration buildings (Smith and South) already have 
large delays even without additional VoIP traffic. 



4. Parametric inference for delay distributions 

This section develops some new results on parametric inference for delay distribu- 
tions. We start with a framework that includes two components: a zero delay and 
a (non-zero) finite delay. Specifically, let Xk be the delay on link k, and suppose 

(3) X k ~p k S { o} + (1 -p k )F(x;O k ). 

Here we assume that F{x\9k) does not give any mass to zero, for all k. So, a 
successful transmission (finite delay or no loss) experiences an empty queue (no 
delay) with probability pk and has some non-zero delay that is distributed according 
to a parametric distribution F(-) indexed by 9k with probability 1 — pk- 
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4 5 6 7 

Fig 5. Three- layer, binary tree 



4-1. Identifiability 



The basic issue for delay distributions is the one posed in Example 2 in the intro- 
ductory section, viz., whether the parameters of a simple two-layer tree (left panel 
of Figure[T]) are estimable from probes sent simultaneously to both receivers. If this 
holds, then the result extends readily to general flexicast experiments that satisfy 
the conditions in Section 3.1 (using the arguments in H). We discuss the details 
briefly. See also 0, [1| for a general discussion of identifiability issues. 

We consider two cases: 

Case 1: If pfc > for all k, no additional assumptions on the distribution F(-) 
are needed. All the link-level delay parameters (pk and 9 k) are identifiable using 
flexicast experiments provided they satisfy the conditions in Section 3.1: a) every 
receiver node is covered and b) every internal node is a splitting node for some 
sub-experiment. 

To see this, consider the two-layer tree on the left panel of Figure [1] Condition 
on the subset of data with Y2,m = and Yz,m > for probes m — 1, . . . ,M. 
Now, Y2. m — implies that both of the internal links X\. m and X2, m had zero 
delay, so Y 3 ^ m — X^ m . So we can use this subset of Y 3i7n to estimate F(x;9z). A 
similar argument can be used to estimate F(x; 62) using the subset of Y} }ln > 
and Y2.n1 = 0. Once these two distributions are estimated, we can easily estimate 
F(x;9i). 

Case 2: If pk — 0, then we need additional assumptions on the delay distributions 
F(x;9k). As we noted in Example 2, the means of the normal distributions are 
not identifiable. If the moments of order two and higher depend on the first mo- 
ment, they will provide additional information for estimating the parameters. One 
such example is when the variance is a function of the mean (as is the case with 
exponential, gamma, log- normal, and Weibull distributions). 



Example 5. We consider here a more general situation with the three-layer bi- 
nary, symmetric tree shown in Figure [5] Let the delay on link k be distributed 
Gamma(afc, /3k)- Suppose we use the flexicast probing experiment {(4, 5), (5, 6), (6, 
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Cov(Y 4 < 4 ' 5 > ,Y 5 < 4 > 5 >) - Cov(y 5 < 5 < 6 \F 6 < 5 < 6 >) = a 2 pl 

Cov(y 6 < 6 - 7 \ r 7 (6 ' 7> ) - Cov(y 5 <5 ' 6) ,y 6 <5 - 6) ) = a,pl 

Let E(Y r ) — v r . We also get the following equations based upon third moments: 



E(r 5 <5 ' 6> 


,. \2/ v <5,6> 


- ve) 


= 2a 1 /3 1 3 , 


E(r 5 <5 ' 6> 


- u h) [y e 


- v&) 




E(r 5 <5 ' 6> 


1/ ^2/^(5,6) 


- ve) 





The corresponding sample moments can be used to estimate the terms on the left. 
Then, estimators for ai,0i, 012, $2, 03, and $3 can be obtained by rearranging the 
above equations. 

The parameters for the receiver links can be estimated with just the first mo- 
ments. For example, the equations for link 4 are: 



E(Y4) = aifli + a 2 (32 + 014(34, 



Var(Y 4 ) = OiiPl + a 2 Pl + 



The unknown parameters are easily obtained from the observed values on the left 
and the estimated parameters on the right. 



4-2. Maximum likelihood estimation 



It turns out that p^, the probability of zero delay, can be estimated using methods 
analogous to those for loss rate discussed in Section 3.2. Recall that a zero delay 
will be observed at the receiver node if and only if there is zero delay at every link. 
On the other hand, a non-zero delay at the receiver link may include zero delays 
at some links, so we have to "recover" this information from the aggregate level 
data. But this is equivalent to the problem with of losses. A packet received at the 
receiver node implies "success" at all the links. A packet not received at the receiver 
node involves a combination of successes and losses, with at least one loss. Thus, 
we can use the data with zero-delays and positive delays in an analogous manner 
to estimate the zero-delay probabilities pk- 

To simplify matters, therefore, we will focus on parametric estimation of Fk(x; 9k) 
assuming that pk — 0. Let us consider some simple examples with the two-layer tree 
with two receivers in the left panel of Figure [T] and with exponential and gamma 
distributions for delays. The gamma family is closed under convolution if the scale 
parameters are the same, so the distribution of the end-to-end delays belong to the 
same family as the link-level delays. Even for these simple cases, we will see that 
the MLE computations are intractable. 

a) Exponential Distributions: Suppose the delay distribution on each link is 
exponential with parameter Afc. Further, we send N probes to both receivers (2,3) 
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simultaneously. The log-likelihood function is 

(4) l(X; Y) = iVlog(Ai) + N\og(X 2 ) + iVlog(A 3 ) - AHog(Ai - A 2 - A 3 ) 

JV N 

- A 2 ^t/i, 2 - a 3 y i}3 

i=l i—1 

N 

+ ^^g[l - exp{-(Ai - A 2 - A 3 ) mm(y it2 ,yi, 3)}] 



There is no analytic solution to maximize this equation over A, so one would have 
to use an iterative technique, such as EM or Newton-Raphson, to find the MLEs 
even in this simple case. 

We examine the details for the EM-algorithm. The exponential distribution is a 
member of the exponential family, so the (unobserved) sufficient statistics are the 
total link-level delays Ya=i X ^ Ya=i x i,2, and Yn=i x i,3- Since X %a = Y^-X^i 
and Xm — Yi 3 — Xi i, we need to compute only the conditional expected values 
of J27=i X hi m ^ ne E-step. The conditional distribution [X^F;? = a , Y3 — b) has 
density 

/r\ ,„s exp{-(Ai - A 2 - A 3 )a;} 

(5) 9{X) = , — r — p: , 0<a:<aA6, 

O^a, 0, Ai, A 2 , A 3 j 

where a A b = min(a, b) and the constant of proportionality is 

1 - exp{-(A! - A a - A 3 )(a A b)} 



C(a, b; Ai,A 2 , A 3 ) = 



Ai — A 2 



Now / Q aAb xg(x)dx is an incomplete gamma function and one can compute the 
expected value X)"=i E(Xi,i |^, 2) ^,3) a s a ratio of the incomplete gamma function 
and the constant C(a, b; X%, A 2 , A 3 ). Thus, the MLEs of the A& can be computed 
without too much trouble in this simple two-layer binary case. 

How well does this extend to more general cases? Suppose we have a three layer 
binary tree (Figure[5]), and we use bicast schemes (4,5), (6,7), and (5,6). Consider 
the scheme (4, 5) which splits at node 2. We can try and mimic the computations 
for the two-layer tree above. However, we have to consider the combined path ^(0,2) 
whose delay is the sum of delays for links 1 and 2. The exponential distribution is 
not closed under convolution, so the distribution is now more complex. The details 
for more general trees will depend on the number of links involved before-and-after 
the splitting node. The problem is even more complex for multicast schemes with 
multiple splitting nodes. We see that the MLE computations are complicated even 
for simple exponential distributions. 

b) Gamma Distributions: Gamma distributions with same scale parameter are 
closed under convolution, i.e., the path delays which are sums of link- level delays 
are still gamma. Specifically, let ~ Gamma(ai, ($) and independent across k for 
k G £ . Wc start with the simple two-layer binary tree. Then, the likelihood function 
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of the observed data is 



L(data) 



n 

i=l 
n 

n 



h(?)h{yi,2 - x)h{Vi,3 - x)dx 



Vi,2^Vi,3 



1 



1 



r(a 2 ) /?"•• 
1 1 



r(ai) 

i,2 - ar) 



i,3 



02-1 exp{- 



_1 exp{- 



3/i,2 



n 



i 



i 



r(ai)r(a 2 )r(a 3 ) ^i+"2+a 3 

ai,2Aai,3 



Vi,3 ~ x 

exp{-- 



} 



dx 



(Vi,2 +y li3 )} 



As before, the MLEs will have to be obtained numerically. 

Let us consider the details of the EM-algorithm. The Gamma distribution is 
a member of the exponential family with sufficient statistics X and log(X). For 
the two-layer tree, we need to compute just the conditional expectation of X\ 
and log(Xi), the unknown delays on the first link. The conditional distribution 
[Xi|Yi = a, Y 2 = b] is now given by 



(6) 



_1 exp{§} 



C(a, b\ ai, a 2 , a 3 , 0) 
where the proportionality constant is 

raAb 



C(a,b;ai,a 2 ,a>3,f3) 



l (o-x) aa_1 (&- 



< x < a A 6, 



a;) Q3 - 1 exp{-}da;. 

P 



This can be used to compute i?[.Xi|Yi, Y2] and i?[log(Xi)|Yi, Y2] numerically. Note 
that 

^ v , v vi C , (Yi,y 2 ;ai + l,a 2 ,a 3 ,^) 

£r[Ai ri, r 2 j - — — — — . 

C(ri, r 2 ; ai, a 2 , a 3 ,pj 

How well does this extend to trees with more than two layers? It turns out that 
the full MLE is still not feasible. However, a combination of "local" MLEs and a 
grafting idea (along the lines of j§|) is feasible. Consider the 3-layer tree in Figure 
Suppose we use a flexicast experiment with 3 bicasts (4,5), (6,7), and (5,6). 
The bicast scheme (4, 5) splits at node 2. So we can combine links 1 and 2 into 
a single link and use the previous results for the two-layer tree to get estimates 
for this subtree. Note that the delay distribution for the combined links 1 and 2 is 
T(a% +a 2 , 0), So we can get "local" MLEs for a\ +a 2 , 014, and f3 from the bicast 
experiment (4, 5). Using a similar argument, we can get estimates for a\ + a 3 , a§, 
a? and /3 from the bicast scheme (6, 7) and estimates for a%, a 2 + as, 013 + a-r and 
f3 from the bicast scheme (5, 6). Now we can use one of several methods to combine 
these estimates to get a unique set of estimates for all of the a>k and 0. Possible 
methods include ordinary or weighted LS. 

We do not pursue this strategy here as the specifics work only for special cases. 
The main message here is that it is not easy to compute the full MLE even in very 
simple cases, and the problem becomes completely intractable as the size of the 
tree and number of children in the links grow. 
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4-3. Method- of -moments estimation 

We discuss the use of method of moments which estimates the parameters by match- 
ing the population moments to the sample moments using some appropriate loss 
function. General losses are possible, but squared error loss leads to more tractable 
optimization and the large-sample properties are easy to establish. 

Let H = {1, . . . , m} be the index set of the probes used in the probing exper- 
iment. Denote the observed data Y r (l), Y r (m) as Y r (H). Let MHH) be the 
observed i-th moment for the j-th scheme based on the probes in H . Let M.\ (9) be 
the functional form of the i-th moment from the j-th probing scheme. For example, 
for the two-layer tree in Figure [1] with Gamma(afe, (3k) distributions on each link, 
we get the following relationships: 

E(Y 2 ) =ai/?i +a 2 p 2 , 
E(Y 3 ) = a 1 f3 1 +a 3 (3 3 , 

Cov(y 2 ,y 3 ) = a^l 

E(Y 2 -v 2 ) 2 {Y 3 -v 3 ) = 2a 1 $, 

V&r(Y 2 ) = ai [3 2 + a 2 f3 2 , 
Var(r 3 )=ai/3? + a 3 /3 3 2 . 

We can now estimate the parameters by minimizing the squared error loss 

m „ 

Q(9; M(H)) = E E [ M IW ~ Mi i( 6 { ' ■ 

3=1 i 

This is a special case of the nonlinear least squares problem and can be solved using 
iterative methods such as the Gauss- Newton procedure (see [l[ for example). After 
rewriting the loss function as a single sum over all the moments, we consider the 
derivatives 

3 i 3 

These can be expressed in matrix form as 
~dQ(0;M(H)) 



06 



= D'[M(H) -M(6)], 



where Di j = dJ ^g^ ■ The moments at the true value can be expanded using a Tay- 
lor series expansion around an initial guess 6*'°' as A4(9q) w M.{9^) +D(0q — 9^). 
Computing the residuals and replacing the true value with the observed moments 
gives an updating scheme based on solving a linear system. Thus at iteration q, we 
have the following linear system. 

M(H) - M{9 {q) ) = Dp. 

Solving this, we get the next iteration as 9^ q+1 ^ — 9^ + (3. 

In general, each iteration should be closer to the minimizer. However, there 
can be situations where the step increases the sum of squares. To avoid this, we 
recommend the modified Gauss-Newton in which the next iteration is given by 
Q( q +i) = Q{q) + r/ g where q < r < 1. This fraction can be chosen adaptively at each 
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step. If the full step reduces the sum of squares, then it is taken. Otherwise, we can 
set r — .5. If the half step fails to reduce the sum of squares, then it can be halved 
again. This guarantees that the loss function is reduced with every step and gives 
convergence to a stationarity point. Examination of the derivatives will indicate if 
the point is a minimum. 

The algorithm has useful complexity properties in terms of both memory and 
computation. Since the estimation is based only on the moments, these values are all 
that need to be stored. This is a vast improvement over algorithms that require all 
of the data or the counts of the binned data. Further, the efficient implementation of 
the algorithm, involving a QR factorization and one matrix inversion, gives compu- 
tational complexity of 0(m 3 ) where m is the number of required moments. Again, 
this is a large improvement over other methods that have exponential complexity. 
Further improvement is achieved in many cases using sparse matrix techniques. 
These two points make the approach ideal for application requiring real-time esti- 
mates. 

The ordinary non-linear least-squares (OLS) method of moment (MOM) esti- 
mators can be inefficient as the different moments are correlated and have unequal 
variances. Since these can often be computed and estimated easily, one can use 
generalized least-squares (GLS) to improve the efficiency. A limited comparison of 
the efficiencies is given in the next subsection. 

It is easy to show that the method-of-moment estimators based on OLS or GLS 
are consistent and asymptotically normal as the sample sizes (numbers of probes) 
increase on a given network. The large-sample distribution can be used to compute 
approximate confidence regions and hypothesis tests which are useful in monitoring 
applications. 



4-4- Relative efficiency of the method- of -moments : a limited study 

We conducted a small simulation study to assess the performance of the MOM 
estimators versus the MLE. This was done on a two-layer binary tree (left panel 
of Figure [T]) for exponential distributions. This is one of the few instances when 
it is practical to construct the MLE. We used two MOM estimators: the OLS and 
the GLS schemes (described above). The GLS methods used a weighting scheme 
based on an empirical estimate of the covariance of the observed moments. Relative 
efficiency is defined as the ratio of the variance of the MLE to the variance of the 
estimator of interest. 

We considered two cases: a) each link has the same mean; i.e., 1/Afe = 1/2, k = 
1, 2, 3, and b) each link has its own mean, 1/Ai = 1/2, I/A2 = 1/4, and I/A3 = 1/6, 
respectively. For both scenarios, 1000 data sets of size 3000 were generated. Boxplots 
of the three sets of estimators are shown in Figures [6] and [7] The procedures appear 
to be unbiased. When the means are the same, the relative efficiencies of the OLS 
MOM are 1.72, 1.33, 1.41 and the relative efficiencies of the GLS MOM are 1.12, 
1.11, and 1.12. When the means are different, the relative efficiencies for the OLS 
are 2.43, 5.09, and 9.13 and the relative efficiencies for the GLS are 1.07, 1.24, and 
1.34. 

In this example, the GLS MOM appears to be quite efficient compared to the 
MLE. However, a much more extensive study is clearly needed to quantify the 
performance of the MOM estimators. 
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Fig 6. Boxplots comparing the MLE with MOM: Case 1 - All means are equal. 



Fig 7. Boxplots comparing the MLE with MOM: Case 2 - The three means are different. 
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4-5. Analysis using the NS-2 simulator 

We now describe a study of the MOM estimators in a simulated network environ- 
ment. The ns-2 package is a discrete event simulator that allows one to generate 
network traffic and transmit packets using various network transmission protocols, 
such as TCP, UDP, ((TBI) over wired or wireless links. The simulator allows the 
underlying link delays to exhibit both spatial and temporal dependence with corre- 
lation between sister links (children of the same parent, i.e. 4, 5, and 6.) around .25 
and autocorrelation about .2 on all of the links. Thus, we can study the performance 
of the active tomography methods under more realistic scenarios. 

We used the topology shown in Figure [8] with a multicast transmission scheme. 
The capacity of all links was set to the same size (100 Mbits/sec), with 11 sources 
(10 TCP and one UDP) generating background traffic. The UDP source sent 210 
byte long packets at a rate of 64 kilobits per second with burst times exponentially 
distributed with mean .5s, while the TCP sources sent 1,000 byte long packets 
every .02s. The main difference between these two transmission protocols is that 
UDP transmits packets at a constant rate while TCP sources linearly increase their 
transmission rate to the set maximum and halve it every time a loss is recorded. 
The length of the simulation was 300 seconds, with probe packets 40 bytes long 
injected into the network every 10 milliseconds for a total of about 3,000. Finally, 
the buffer size of the queue at each node (before packets are dropped and losses 
recorded) was set to 50 packets. 

We studied only the continuous component of the delay distribution, i.e., the 
portion of the path-level data that contain zero or infinite delays was removed. 
The traffic-generating scenario described above resulted in approximately uniform 
waiting times in the queue (see Figure [5]). This is somewhat unrealistic in real 



network situations where traffic tends to be fairly bursty llj, but it provides a 
simple scenario for our purposes. Estimating the unknown parameters for this model 
is equivalent to estimating the maximum waiting time for a random packet. 

Figure [10] shows quantile-quantile plots using simulated values from the fitted 
distributions versus the observed ns-2 delays for both the links and paths. Specifi- 
cally, we estimated the parameters for the uniform distributions using the moment 
estimation procedure and then generated data based on these parameter values. 
The fitted values were: b = [.89, .79, .53, 1.10, 1.09, 1.13]. The estimation procedure 
does quite well on all of the links except for the interior link 3. The algorithm seems 




4 5 6 

Fig 8. Portion of the UNC network used in the ns-2 simulaton scenario. 
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Fig 9. Histogram of link delays for the ns-2 simulation. 



to compensate for this under-estimation (about 40%) by slightly overestimating the 
parameters for each of the descendants of link 3, as evidenced by the closely matched 
quantiles for the end-to-end data. This error is probably the result of several fac- 
tors. First, link 3 deviates the most from the uniform distribution with the last 
bin in Figure [5] being too thin. Secondly the algorithm appears to be moderately 
affected by the violations of the independence assumptions, particularly the spatial 
dependence among the children of link 3. This could likely be somewhat relieved 
by using a larger sample size and accounting for the empty queue probabilities. 
Nevertheless, the estimation performs well overall. 



5. Summary 

There are a number of other interesting problems that arise in active network to- 
mography. There are usually multiple source nodes, which raises the issue of how to 
optimally design flexicast experiments for the various sources. We have also assumed 
that the logical topology of the tree is known. However, only partial knowledge of 
the network topology is typically available, and one would be interested in using 
the path level information to simultaneously discover the topology and estimate 
the parameters of interest. The topology discovery problem is computationally dif- 
ficult (NP-hard), but methods using a Bayesian formulation as well as those based 
on clustering ideas have been proposed in the literature (see Q and references 
therein). 

Active tomography techniques are useful for monitoring network quality of ser- 
vice over time. However, this application requires that path measurements are col- 
lected sequentially over time and appropriately combined. The probing intensity, 
the type of control charts to be used for monitoring purposes, and the use of path- 
vs link- information are topics of current research. 
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Fig 10. QQ plots for the links and paths of the ns-2 simulator example. The fitted quantiles come 
from simulating delays from distributions with parameters given by the fitted values of the data. 
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